CLAIMS 



.claim: 

1. / A cach^coherency system for a shared memory 

parallel processing, system including plurality of processing 
nodes, comprising s a\^multi-stage communication network for 
interconnecting said processing nodes; each said processing 
node including one or mbre caches for storing a plurality of 
cache lines; and a cachex^oherency directory which is 
distributed to each of saick nodes for tracking which of said 
nodes have copies of each caobe line* 



2- A shared memory raarallel processing system 

including a plurality of pnocessing nodes, comprising: 



a multi-stage communicaVbion network for interconnecting 
said processing nodes, said network including a 
plurality of self-routirja switches cascaded into first, 
middle and last stages/ 4abh said switch including a 
plurality of switch ir/putWl and a plurality of switch 
outputs, each of said swittch outputs of each said 
switch coupled to a different switch input of others of 
said switches , switch outputs ' of said last stage 
switches including network ©utput ports, and switch 
inputs of said first stage switches comprising network 
input ports ; 
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14 each processing node including: 

15 a network adapter for transmitting and receiving 

16 messages with respect to other processing nodes 

17 over said network; 

18 a local processor; 

19 at least one private write-through cache; 

20 a section of shared memory organized into a 

21 plurality of caqhe lines, each cache line 
™22 including one or\more addressable memory 
yj23 locations; \ 

||i24 a cache coherencw^nlKectory for tracking which of 

0]25 said nodes have /coroiyeB of each cache line; 

□26 said local processor at a first processing node being 

^27 operable for writing data\ to said private cache at said 

□28 first node, as the same data is written to either 

'^29 shared memory at said fir at node or sent over said 

~30 network for writing to the\ shared memory and private 

31 cache of a second processing node. 

1 3, The shared memory paraluel processing system of 

2 claim 2 , wherein said section of shared memory is divided 

3 into first and second portions, said first portion for 

4 storing unchangeable data, and said second portion for 

5 storing changeable data. I 
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1 4. The shatted memory parallel processing system of 

2 claim 3, said cache coherency directory for this processing 

3 node listing which \ nodes of the plurality of nodes have 

4 accessed copies of Waid cache lines in said second portion 

5 of shared memory at\this processing node. 



1 

2 
3 
4 
5 



3 6 

W 7 
I 8 

m 9 



cry parallel processing system of 
processing node is operable for 
alidating the shared memory at any 
sing nodes selectively by 



5 . The shared Jiiei 

claim 4, wherein eacf 
reading, storing, an|d fin' 
of said plurality of'^r(s>ce 

transmitting and receiving messages over said network, a 
first message type for requesting the read of a cache line, 
a second message type for returning the requested cache 
line, a third message type tor storing a cache line, and a 
fourth message type for invalidating a cache line. 



□ 
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1 6. The shared memory parallel processing system of 

2 claim 5, said network adapter further comprising: 



3 
4 
5 



a first buffer for transmitting to said network shared 
memory read command messages of said first message type 
and said second message type; 



6 
7 
8 



a second buffer \f or transmitting to said network shared 
memory store com}{iand messages of said third message 
type ; 



ih 12 

m 13 

six 

' 14 



a third buffer for\t 
invalidate messages^ 
of said fourth raes 



smitting to said network 
for/baid cache coherency directory 
a<5e /type; 



a fourth buffer for r^peiv^ng from said network shared 
memory read command^e^ages of said first message type 
and said second message \type; 



r: 15 
□ 16 
^ 17 



a fifth buffer for receiving from said network shared 
memory store command messages of said third message 
type ; and 



18 
19 
20 



a sixth buffer for receiving from said network 
invalidate messages for said\cache coherency directory 
of said fourth message type. 
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1 7o A shared memory parallel processing system, 

2 comprising: \ 

3 a plurality! of nodes, each node including a node 

4 memory, at least! one cache, and a memory controller; 

5 a multi-staqe switching network for 

6 interconnecting said processing nodes, said switching 

7 network including! a plurality of self-routing switches 

8 cascaded into first, middle and last stages, each said 

9 switch including a\ plurality of switch inputs and a 

10 plurality of switcm outputs, each of said switch 

11 outputs of each saiVj switch coupled to a different 

12 switch input of others of said switches, switch outputs 
^13 of said last stage switches including network output 

ports, and switch inraits of said first stage switches 

^15 comprising network inmit ports; 

j]=il6 a system memory di^stributed to said node memories 

G]17 of said plurality of moaea and accessible by any node; 

7"18 each said node memoryj beo/ng organized into a plurality 

□ 19 of addressable word loca^iorvs; 

r:20 said memory controller at this node operable for 

□21 performing local memory aqcess to the portion of system 

^^22 memory at this node and foir performing remote memory 

23 access over said network to\the portion of system 

24 memory at other nodes; and \ 

25 a cache coherency controller at this node being 

26 responsive to both local memory accesses and remote 

27 memory accesses to data storefl in a word location of 

28 said node memory at this node \f or caching accessed data 

29 in the cache of this node and For communicating data 

30 for assuring cache coherency throughout said system 

31 over said network. \ 
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1 8. The shaited memory processing system of claim 7, 

2 said system memory\ being distributed in equal portions to 

3 each said node memary; and said node memory being further 

4 sub-divided into a cirst memory section for storing data 

5 that is changeable and a second memory section for storing 

6 data that is unchangeable. 

1 9. The shared taemory processing system of claim 7, 

2 further comprising noae indicia for uniquely identifying 

3 each node* \ 

Ly 1 10. The shared merabry processing system of claim 7, 

2 said cache coherency controller further comprising: 

^1 3 an invalidation direptbry for storing a list of node 

^ 4 indicia identifying jtnos^ nodes having accessed a copy 

□ 5 of each said cache ]\iiie of^node memory since the last 
£T 6 time the cache line waa changed. 

□ \ 

1 11. The shared memory processing system of claim 10, 

2 said cache coherency controllen further comprising: 

3 an overflow directory for expanding said invalidation 

4 directory when the list of node indicia for a cache 

5 line becomes too long to contain entirely with said 

6 invalidation directory. \ 
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12 • A shared memory parallel processing system, 

comprising: \ 

a plurality of nodes, each node including a node 
memory, at least one cache, and a memory controller; 

a multi-stage switching network for interconnecting 
said processing nodes, said switching network including 
a plurality of self-rJ&uting switches cascaded into 
first, middle and last\ stages, each said switch 
including a plurality of switch inputs and a plurality 
of switch outputs, eachVof said switch outputs of each 
said switch coupled to a\different switch input of 
others of said switches, switch outputs of said last 
stage switches including /ntetwbrk output ports, and 
switch inputs of said f irsW snage switches comprising 
network input ports ; and / \ \ 

a network adapter responsive ito a node connection 
request for establishing a connection path to a target 
node, first by attempting to establish a quick 
connection path across a pluraMty of segments of said 
switching network to said targe* node, and upon 
determining any one of said plurality of segments is 
not available, issuing a camp-on connection request to 
said target node, \ 
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1 13. The sharefl memory parallel processing system of 

2 claim 12, further comprising: 



3 
4 
5 



said plurality of nodes each coupled to one of the 
network output pprts and to one of the network input 
ports ; 



8 
9 



W 10 



"2 11 

in 12 

5 13 
□ l5 

1= 



each node further! including: 

receive meana for receiving a data message; and 

send means f on sending a data message across an 
n-stage switching network from a local node to a 
remote node, sad_d send means generating said 
[U^st]^ including n sequential 

each sequential connection 
oi\e of said plurality of 
connection segments f^r each of the n switch 
stages of said nettwork. 



connection re 
connection cojnmal 
command select 



1 14, The shared memory jferallel processing system of 

2 claim 12, each said switch being responsive to node 

3 connection requests and camp-on connection requests for 

4 establishing connection segmen\:s from any switch input port 

5 to any switch output ports. 
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1 15. The shared memory parallel processing system of 

2 claim 14, each said\switch further comprising: 

3 a data bus for transferring said data message; 

4 a rejection control line for signalling back to a 

5 sending node a rejection of any connection request; 

6 an acceptance control line for signalling back to said 

7 sending node the accentance of a camp-on connection 

8 request; \ /\ 

^9 a valid control line tor Receiving from said sending 

yiO node the activation on a node connection request; and 

^ sx, ■ " \ 

mil a camp-on control line lor receiving from said sending 

^512 node the activation of a\ camp-on connection request. 
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1 16. A bi-diractional network adapter for interfacing a 

2 local node of a shared memory parallel processing system to 

3 a multi-stage switching network for communications with a 

4 remote node, each saiu node including a node memory 

5 including a changeable portion and an unchangeable portion, 

6 and a node cache; said\ network adapter comprising: 

7 a plurality of sena buffers for storing and forwarding 

8 data messages from said local node to said remote node 

9 over said network, and 

10 a plurality of receive buffers for storing and 

™11 forwarding a pluralit^^of data messages from said 

yl2 remote node to said local node over said multi-stage 

^^13 network; \ 

Pl4 said data messages incliyfei^g: 

S \ |\ 

□15 an invalidation message Ikor invalidating a cache > 

-'16 line that was accessedX py\a remote node after said 

nl7 cache line has changed A \ 

"^18 a read reguest message for requesting access of a 

19 cache line from a remote mode; 

20 a response message for retxirning a cache line over 

21 the network to a remote node that has previously 

22 requested data by a read reqmest message; and 

23 a store message storing a changed cache line to a 

24 remote node. \ 
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1 17. The networic adapter of claim 16, said data 

2 messages further including a message header comprising: 

3 message type differentiation indicia; 



4 
5 



destination node indicia for identifying a node for 
receiving said data message over said network; 



6 
7 



source node indicia ^or identifying a node for 
transmitting said daua message over said network; 



filO 
^12 



message length indici^ 
of words included in 



for defining the variable number 
data message; 



memory area indicia for 16b fining whether memory words 
included in said data mess^ag\ are read from said 
changeable area; 



^13 

El4 

O 

^15 
"l6 
17 



time indicia for defining 
said data message; and 



the time of generation of 



memory address indicia for defining the address 
location in memory of the memory word included in said 
data message . 



EM997080 



99 



1 18. The network adapter of claim 17, said send buffers 

2 further comprisirig: 



3 
4 

5 



a read send fvCFO for storing and forwarding read 
request messages and response messages from said local 
node to said remote node; 



6 
7 



a store send FIFO for storing and forwarding store 
messages from sa\d local node to said remote node; and 



8 
9 

yj 

? 12 

T 13 

O 14 



an invalidation sekd FIFO for storing and forwarding 
invalidation messag)i|s from said local node to said 
remote node; 

and said receive buffers furtjaer comprising: 

a read receive FIFO ^D^yfeto^ing and forwarding read 
request messages and resp^nsfe messages from said remote 
node to said local node; 



□ 15 
¥16 



a store receive FIFO for storing and forwarding store 
messages from said remote noote to said local node; and 



17 
18 
19 



an invalidation receive FIFO for storing and forwarding 
invalidation messages from said gemote node to said 
local node. 
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19. The network adapter of 

comprising: \ 



claim 18, further 



a send FIFO selection means for prioritizing the 
selection of a data message from one of said three send 
FIFO means for transmission to said network by first 
selecting data messages from said invalidation send 
FIFO and thereaf ter^lternatively selecting data 
messages from said read and store send FIFOs; 



a receive FIFO selectio 
message type differeii 
of said three receives 
message received fro 



means responsive to said 
:Upn indicia for selecting one 
FlybVieans for storing a data 
sgfidX network; and 



said network adapter being responsive to a node 
connection request for establishing a connection path 
to a target node, first by attempting to establish a 
quick connection path across a plurality of segments of 
said switching network to said target node, and upon 
determining any one of said plurality of segments is 
not available, issuing a camp-on connBction request to 
said target node. 
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1 20. A memory controller for a local node of a shared 

2 memory parallel processing system, said node including a 

3 node processor, a nodeXmemory, a node cache and an I/O 

4 adapter, said system inaluding a multi-stage switching 

5 network for communications amongst said local node and a 

6 plurality of remote nodesV said node memory including a 

7 changeable portion and an ijinchangeable portion; said memory 

8 controller comprising: 



9 
10 
11 



first means responsive wto a request by said processor 
for access to a memory word for first accessing said 
node cache of said localXnode; and 



Wl2 
113 

ml 4 

ri6 



second means responsive t 



unable to access said menoiry 



for accessing said memory 
line in said node memory 
said cache line to said noaeXcachbv^ 



"said first means being 



, ord in said node cache 
A selecti 



wiprjclX selectively from a cache 
r \remGite memory and storing 



u 1 21. The memory controller of <\laim 20, further 

2 comprising: 



3 
4 
5 
6 
7 
8 



remote fetch interrupt means for Ussuing an interrupt 
signal to said node processor upoA determining that a 
requested memory word is located iA remote memory for 
causing said node processor to switoti from a first 
instruction stream thread to a second\ instruction 
stream thread. 
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1 22. The memor^ controller of claim 20, further 

2 comprising: 



3 
4 
5 
6 
7 
8 



data message generation means responsive to a request 
from a remote node for accessing a cache line 
identified by a remote request read address for 
generating a read response message to return the 
accessed cache line\to said remote node, said read 
response message including a message header comprising 



9 
10 

O 

yii 

^12 
!pl3 

fl4 
□ 15 

J ; 

016 
^17 

""is 

19 
20 

21 
22 
23 
24 
25 



message dif f erenmation indicia for defining said 
read request message type; 

destination node innicia equal to the sector 
segment of said nod^ memory for said addressed 
memory word; 



source node indici 
the local node; 



I set \o the node ID nuinber of 



message length indicia for defining said read 
request message as being comprised of said message 
header only; and 

memory address indicia for specifying the memory 
address of said memory word; 

said data message generation meansXfurther operable for 
delivering said read response message to a read send 
FIFO of said network adapter for transmission to said 
network and the remote node selected l^y said 
destination node indicia • 
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1 23, The memor^ controller of claim 20, further 

2 comprising: 

3 an invalidation ciirectory; 



4 
5 
6 
7 
8 
9 



cast-out means for\ deleting a cache line from said node 
cache when said cache is full to provide space for a 
new cache line to be stored to said cache; and for 
sending the address qf the deleted cache line to said 
invalidation directory to indicate said node no longer 
has a copy of said cacne line. 



W 1 

= 2 

m 3 

T 4 

□ 5 
fi 6 

H 

□ 7 



8 
9 
10 
11 



24. The memory contro 

comprising: 



of claim 23, further 



cast-out message generatiom means"-i?esponsive to said 
cast-out means deleting a c^he line addressed to a 
remote node for generating a\cast-out message to said 
remote node to send the cast-^ut address and the local 
node ID number to said remote bode over said network; 

cast-out message receiving meana f or delivering a 
cast-out address and the source node ID number from the 
message header of a cast-out messa^ge to said 
invalidation directory. 
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1 25 • The ineinori\ controller of claim 20, further 

2 comprising: 



3 
4 
5 



cache copy update\ means for sending cache update 
messages to update corresponding cache lines all remote 
nodes having copieB of a changed cache line; and 



6 
7 
8 



cache update message receiving means for replacing a 
cache line of data wVth an updated cache line of data 
received from a remote node. 



^ 1 

yy 2 

m 3 

m 4 



26. The bi-directional 

said data messages further qoffipi-i) 



ork adapter of claim 16, 
ing: 



a cast-out message for i^h^lidating an invalidation 
directory entry at a remot6( node for this local node; 



B 5 
b 7 



a cache copy update message ttor updating copies of a 
changed cache line at this Ideal node at remote nodes 
having copies of said changed\ cache line; and 



8 
9 
10 



a node indicia assignment message for sending a 
different node number to each o\ the plurality of nodes 
of the system. 
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1 21. A method Vf or operating memory controller for a 

2 local node of a shaded memory parallel processing system, 

3 said node including a node processor, a node memory, a node 

4 cache and an I/O adapter, said system including a multi- 

5 stage switching netwonk for communications amongst said 

6 local node and a plurality of remote nodes, said node memory 

7 including a changeable portion and an unchangeable portion; 

8 the method comprising tnse steps of : 



9 
10 
11 

C3 

hii2 

^13 
Sl4 

|l5 

x: 

"="16 



responsive to a request by said processor for access to 
a memory word, accessYng said node cache of said local 
node; and thereafter 



responsive to said f irfet m^ans being unable to access 
said memory word in said\n 
memory word selectively 
memory or remote memory 
said node cache. 



e cache, accessing said 

a cache line in said node 
scoring said cache line to 



□ 1 
S 2 



28. The method of claim 27, ^further comprising the 

step of : 



3 
4 
5 
6 
7 



issuing an interrupt signal to 4aid node processor upon 
determining that a requested memdry word is located in 
remote memory for causing said nofle processor to switch 
from a first instruction stream thread to a second 
instruction stream thread. 
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1 29 • A method f or \operating bi-directional network 

2 adapter for interfacing a local node of a shared memory 

3 parallel processing sysuem to a multi-stage switching 

4 network for communicaticms with a remote node, each said 

5 node including a node memory including a changeable portion 

6 and an unchangeable portion, and a node cache; comprising 

7 the steps of : \ 

8 operating a plurality\of send buffers for storing and 

9 forwarding data messages from said local node to said 
10 remote node over said metwork, and 

pll operating a plurality on receive buffers for storing 

yi2 and forwarding a plurality of data messages from said 

'^13 remote node to said locaA node over said multi-stage 

yil 4 network ; 

^15 said data messages includlnas /\ 

ifl ... V V \ . . 

Q16 an invalidation messagMfor ir\yalidating a cache 

□17 line that was accessed a remote node after said 

'^18 cache line has changed; \ 

19 a read request message for\ requesting access of a 

20 cache line from a remote none; 

21 a response message for returning a cache line over 

22 the network to a remote node that has previously 

23 requested data by a read request message; and 

24 a store message storing a changed cache line to a 

25 remote node. \ 
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1 30. The methoq of claim 29, further comprising the 

2 steps of : 



3 
4 
5 



operating a read\send FIFO for storing and forwarding 
read request messlages and response messages from said 
local node to saifl remote node; 



6 
7 
8 



operating a store s^end FIFO for storing and forwarding 
store messages from\said local node to said remote 
node ; and 



9 

y 11 



operating an invalidation send FIFO for storing and 
forwarding invalidationyS^sages from said local node 
to said remote node; 



m 12 

r 14 



operating a read recei^fe MfCi for storing and 
forwarding read reguest\i^4sages and response messages 
from said remote node to said Itocal node; 



lI 15 
O 16 
S 17 



operating a store receive FIFO for storing and 
forwarding store messages froii\ said remote node to said 
local node; and 



18 
19 
20 



operating an invalidation receive FIFO for storing and 
forwarding invalidation messages !^rom said remote node 
to said local node. 
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